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Abstract 

Sampling from the lattice Gaussian distribution is emerging as an important problem in coding and cryptography. 

In this paper, the classic Metropolis-Hastings (MH) algorithm from Markov chain Monte Carlo (MCMC) methods 
is adapted for lattice Gaussian sampling. Two MH-based algorithms are proposed, which overcome the restriction 
suffered by the default Klein’s algorithm. The first one, referred to as the independent Metropolis-Hastings-Klein 
(MHK) algorithm, tries to establish a Markov chain through an independent proposal distribution. We show that 
the Markov chain arising from the independent MHK algorithm is uniformly ergodic, namely, it converges to the 
stationary distribution exponentially fast regardless of the initial state. Moreover, the rate of convergence is explicitly 
calculated in terms of the theta series, leading to a predictable mixing time. In order to further exploit the convergence 
potential, a symmetric Metropolis-Klein (SMK) algorithm is proposed. It is proven that the Markov chain induced by 
the SMK algorithm is geometrically ergodic, where a reasonable selection of the initial state is capable to enhance 
the convergence performance. 

Keywords:Lattice Gaussian sampling, lattice coding, lattice decoding, MCMC methods. 

I. Introduction 

Recently, the lattice Gaussian distribution is emerging as a common theme in various research fields. In mathe¬ 
matics, Banaszczyk firstly applied it to prove the transference theorems for lattices m. In coding, lattice Gaussian 
distribution was employed to obtain the full shaping gain for lattice coding |2), ||3l|, and to achieve the capacity of the 
Gaussian channel and the secrecy capacity of the Gaussian wiretap channel, respectively S3J, Q. In cryptography, the 
lattice Gaussian distribution has already become a central tool in the construction of many primitives. Specifically, 
Micciancio and Regev used it to propose lattice-based cryptosystems based on the worst-case hardness assumptions 
0. Meanwhile, it also has underpinned the fully-homomorphic encryption for cloud computing 0. Algorithmically, 
lattice Gaussian sampling with a suitable variance allows to solve the shortest vector problem (SVP) and the closest 
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vector problem (CVP); for example, it has led to efficient lattice decoding for multi-input multi-output (MIMO) 
systems (8j, (51. In theory, it has been demonstrated that lattice Gaussian sampling is equivalent to CVP via a 
polynomial-time dimension-preserving reduction ed, and SVP is essentially a special case of the CVP 

Due to the central role of the lattice Gaussian distribution playing in these fields, its sampling algorithms become 
an important computational problem. In contrast to sampling from a continuous Gaussian distribution, it is by no 
means trivial to perform the sampling even from a low-dimensional discrete Gaussian distribution. As the default 
sampling algorithm for lattices, Klein’s algorithm 1111 samples within a negligible statistical distance from the lattice 
Gaussian distribution if and only if the standard deviation er > ^oj(\og n) •max 1 < i < ra ||b i || lfl2l . where w(log n ) is 
a superlogarithmic function, n denotes the lattice dimension and b,’s are the Gram-Schmidt vectors of the lattice 
basis B. However, such requirement of a can be excessively large, rendering Klein’s algorithm inapplicable to 
many scenarios of interest. 

Markov chain Monte Carlo (MCMC) methods attempt to sample from the target distribution by building a 
Markov chain, which randomly generates the next sample conditioned on the previous samples. After a burn-in 
period, which is normally measured by the mixing time, the Markov chain will reach a stationary distribution, and 
successful sampling from the complex target distribution can be carried out. The complexity of the each Markov 
move is often insignificant. Therefore, we are mostly concerned with the mixing time that the underlying Markov 
chain needs to get steady. 

To this end, the Gibbs algorithm was introduced into lattice Gaussian sampling, which employs univariate 
conditional sampling to build a Markov chain lfl3l . It is able to sample beyond the range that Klein’s algorithm. In 
El, a flexible block-based Gibbs algorithm was introduced, which performs the sampling over multiple elements 
within a block. In this way, the correlation within the block could be exploited, leading to faster convergence 
especially in the case of highly correlated components. Unfortunately, the related analysis of the convergence rate 
for the associated Markov chain in these two algorithms was lacking, resulting in an unpredictable mixing time. 
Actually, according to coupling technique m, it is easy to know that the Gibbs-based MCMC sampler converges 
exponentially fast for any finite state space Markov chain. However, as the Markov chain associated with lattice 
Gaussian sampling naturally has a countably infinite state space rather than a finite one, its convergence is still 
unknown. 

On the other hand, Gibbs sampling has already been introduced into multiple-input multiple-output (MIMO) 
communications for signal detection El-lEol. In particular, the selection of a (also referred to as “temperature”) 
is studied in m and it is argued that a should grow as fast as the signal-to-noise ratio (SNR) in general for 
fast mixing. In |fl6l . a mixed-Gibbs sampler is proposed to achieve the near-optimal detection performance, which 
takes the benefits of an efficient stopping criterion and a multiple restart strategy. Moveover, Gibbs sampling is 
also introduced into the soft-output decoding in MIMO systems, where the extrinsic information calculated by a 
priori probability (APP) detector is used to produce soft outputs ED. In ED, the investigation of the Gibbs-based 
MCMC receivers in different communication channels are given. Due to the finite state space formed by a finite 
modulation constellation, those Gibbs samplers converge geometrically fast to the stationary distribution. However, 
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the rate of convergence has not been determined. 

In this paper, another famous MCMC scheme, known as the Metropolis-Hastings (MH) algorithm 1 21I I. is studied 
in details for lattice Gaussian sampling. In particular, it makes use of a proposal distribution which suggests a 
possible state candidate and then employs a acceptance-rejection rule to decide whether to accept the suggested 
candidate in the next Markov move. Obviously, the art of designing an efficient MH algorithm chiefly lies in 
choosing an appropriate proposal distribution, and this motivates us to design the target proposal distributions based 
on Klein’s algorithm. 

In the proposed independent Metropolis-Hastings-Klein (MHK) algorithm, through Klein’s algorithm, a candidate 
at each Markov move is generated from a Gaussian-like proposal distribution. In this case, we show that the Markov 
chain induced by the proposed algorithm is uniformly ergodic, namely, it converges to the stationary distribution 
exponentially fast irrespective of the starting state. Further analysis of its convergence rate is then performed, where 
the convergence rate can be explicitly estimated given the lattice basis B, query point c and standard derivation 
er. Thus, the mixing time of the proposed algorithm becomes tractable. Note that the proposed independent MHK 
algorithm is well applicable to MIMO detection, thus making the convergence rate as well as the mixing time 
accessible. To the best of our knowledge, this is the first time that the convergence rate of MCMC in communications 
and signal processing can be determined analytically since MCMC was introduced into this field in 1990s (22). 

Different from the algorithms in |23l , Il24l which have an exponential lower bound 2 n,/2 on the space and time 
complexity, the proposed independent MHK algorithm has polynomial space complexity, and its time complexity 
varies with er, where a larger value of er corresponds to smaller mixing time. This is in accordance with the fact 
we knew before; if er is large enough, then there is no need of MCMC in lattice Gaussian sampling since Klein’s 
algorithm can be applied directly with polynomial time complexity. 

The second proposed algorithm, named as symmetric Metropolis-Klein (SMH) algorithm, establishes a symmetric 
proposal distribution between two consecutive Markov states. We show it also converges to the stationary distribution 
exponentially fast while the selection of the initial state also plays a role. Such a case is referred to as geometric 
ergodicity in MCMC literature H25| . Besides the geometric ergodicity, another advantage of the proposed SMH 
algorithm lies in its remarkable elegance and simplicity, which comes from the usage of a symmetrical proposal 
distribution. 

To summarize, the main contributions of this paper are the following: 

1) The independent MHK algorithm is proposed for lattice Gaussian sampling, where the Markov chain 
arising from it converges exponentially fast to the stationary distribution. 

2) The convergence rate in the independent MHK algorithm is explicitly estimated by theta series, thereby 
leading to a tractable mixing time of the underlying Markov chain. 

3) The SMH algorithm is further proposed for lattice Gaussian sampling, which not only achieves an 
exponential convergence performance, but also is greatly simplified due to its symmetry. 

The rest of this paper is organized as follows. Section II introduces the lattice Gaussian distribution and briefly 
reviews the basics of MCMC methods. In Section III, we propose the independent MHK algorithm for lattice 
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Fig. 1. Illustration of a two-dimensional lattice Gaussian distribution. 


Gaussians, where uniform ergodicity is demonstrated. In Section IV, the convergence rate of the independent MHK 
algorithm is analyzed and explicitly calculated in terms of the theta series. In Section V, the proposed symmetric 
random walk MH algorithm for lattice Gaussian sampling is given, followed by the demonstration of geometric 
ergodicity. Finally, Section VI concludes the paper. 

Notation: Matrices and column vectors are denoted by upper and lowercase boldface letters, and the transpose, 
inverse, pseudoinverse of a matrix B by B T ,B -1 , and B' , respectively. We use b, for the /th column of the 
matrix B, b, for the /th Gram-Schmidt vector of the matrix B, bij for the entry in the /th row and yth column 
of the matrix B. [xj denotes rounding to the integer closest to x. If x is a complex number, \x\ rounds the real 
and imaginary parts separately. Finally, in this paper, the computational complexity is measured by the number of 
arithmetic operations (additions, multiplications, comparisons, etc.). 

II. Preliminaries 

In this section, we introduce the background and mathematical tools needed to describe and analyze the proposed 
lattice Gaussian sampling algorithms. 

A. Lattice Gaussian Distribution 

Let B = [bi,..., b„] C R'" consist of n linearly independent vectors. The //-dimensional lattice A generated by 
B is defined by 

A = {Bx : x € Z n }, (1) 

where B is known as the lattice basis. We define the Gaussian function centered at c £ R" for standard deviation 
cr > 0 as 

||z-c || 2 

p ac (z)=e ^ , (2) 
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Algorithm 1 Klein’s Algorithm 


Input: B.ct, c 
Output: Bx € A 

1: let B = QR and c' = Q^c 
2: for i = n, ..., 1 do 

3: let tJj = i- 2 -, and x; = 

In i I * 

4: sample Xi from D z,cri,xi 

5: end for 

6: return Bx 


for all z £ R". When c or a are not specified, we assume that they are 0 and 1 respectively. Then, the discrete 
Gaussian distribution over A is defined as 


Pa,<t,c ( x ) 


Pc t,c(Bx) 

P(7,c(A) 


e 2 ct 


IIBx- 


E, 


e 


IBx—ell 2 


(3) 


for all Bx £ A, where p a ^ c (h) = X^Bx 6 aP<t,c(Bx) is just a scaling to make a probability distribution. 

We remark that this definition differs slightly from the one in (6), where a is scaled by a constant factor \/2n 
(i.e., s = \[2i rcr). Fig. 1 illustrates the discrete Gaussian distribution over Z 2 . As can be seen clearly, it resembles 
a continuous Gaussian distribution, but is only defined over a lattice. In fact, discrete and continuous Gaussian 
distributions share similar properties, if the flatness factor is small 0. 


B. Klein’s Algorithm 


Obviously, an intuition of D\ }Cr , c (x) suggests that a lattice point Bx closer to c will be sampled with a higher 
probability. Therefore, sampling from the lattice Gaussian distribution can be naturally used to solve the CVP 
(where c is the query point) and SVP (where c = 0) in lattices. Because of this, Klein’s algorithm that samples 
from a Gaussian-like distribution was originally designed for lattice decoding El. and small size of cr is preferred 
for the consideration of the decoding efficiency. As shown in Algorithm 1, the operation of the Klein’s algorithm 
has polynomial complexity 0(n 2 ) excluding QR decomposition. More precisely, by sequentially sampling from 
the 1-dimensional conditional Gaussian distribution in a backward order from x n to x\, the Gaussian-like 

distribution arising from Klein’s algorithm is given by 


( Rlcin (x) 


TTn > At.c(Bx) 

I I Dj l lJi Xi (Xi) — „„ . 

1L=1 P<x„3f«(Z) 


e 20 


IIBx—ell 


nr=iEx ( 


(4) 


where R,- = c » ‘+ 1 ; a . = -_2_ ) c ’ — Qt c anc ) jj = QR. | n || 1 21 . it has been demonstrated that PKi e in( x ) 

r i,i \ r i,i I 

is close to a c (x) within a negligible statistical distance if 


a > s/u )(log n) ■ maxi<j< n ||bj||. 


(5) 
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However, even with the help of lattice reduction (e.g., LLL reduction), the size of max i < ! ;<„||b,|| still can be 
excessively large. 

C. MCMC Methods 

As for the lattice Gaussian sampling in the range a < y/w {log n)-maxi<j<„||bj||, MCMC methods have become 
an alternative solution, where the discrete Gaussian distribution Da, J jC is viewed as a complex target distribution 
lacking direct sampling methods. By establishing a Markov chain that randomly generates the next state based on 
the previous state, MCMC is capable of sampling from the target distribution of interest, thereby removing the 
restriction on a 03. 

As an important parameter which measures the time required by a Markov chain to get close to its stationary 
distribution, the mixing time is defined as 1261 

<mix(e) = rnin{f : maxUP^x, •) - 7r(-)|| T v < e}, ( 6 ) 

where || • || tv represents the total variation distance (other measures of distance also exist 01 ). It is well known 
that the spectral gap 7 of the transition matrix offers a good upper bound on the mixing time, that is 

fmix(e) < -log ( — ) , (7) 

7 V 71-mine/ 

where 7 r m j n = min xg f 27 r(x), LI stands for the state space, 7 = 1 — |Ai| >0 and Ai represents the second largest 
eigenvalue of the transition matrix P in a Markov chain. Therefore, a large value of the spectral gap leads to rapid 
convergence to stationarity (27). 

However, the spectrum of the Markov chain is very hard to analyze, especially the state space Ll tends to be 
exponentially large, making it difficult to have a compact, mathematical representation for the adjacency matrix. 
Thanks to the celebrated coupling technique, for any Markov chain with finite state space 12, exponentially fast 
convergence can be demonstrated if the underlying Markov chain is irreducible and aperiodic with an invariant 
distribution tt (26). Nevertheless, in the case of lattice Gaussian sampling, the countably infinite state space x £ Z” 
naturally becomes a challenge. To this end, we perform the convergence analysis from the beginning — ergodicity 

flU). 


Definition 1. Let P be an irreducible and aperiodic transition matrix for a Markov chain. If the chain is positive 
recurrent, then it is ergodic, namely, there is a unique probability distribution tt on LI and for all x £ LI, 

lim ||P*(x, •) - 7 r|| Ty = 0, ( 8 ) 

t—f OO 

where P 4 (x; •) denotes a row of the transition matrix P for t Markov moves. 

Although ergodicity implies asymptotic convergence to stationarity, it does not say anything about the convergence 
rate. To this end, the following definitions are given (28). 
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Definition 2. A Markov chain having stationary distribution tt(-) is uniformly ergodic if there exists 0 < 5 < 1 
and M < oc such that for all x 


||P t (x,-)- 7 r(-)||Ty <M(l-<5) 4 . (9) 

Obviously, the exponential decay coefficient 6 is key to determine the convergence rate. As M is a constant, 
the convergence rate does not depend on the initial state x. As a weaker version of uniform ergodicity, geometric 
ergodicity also converges exponentially as well, but M is parameterized by the initial state x. 

Definition 3. A Markov chain having stationary distribution 7r(-) is geometrically ergodic if there exists 0 < <5 < 1 
and M(x) < oo such that for all x 


||P*(x, •) - tt(-)\\ tv < M (x)(1 - <5)*. (10) 

Besides exponential convergence, polynomial convergence also exists l29t . which goes beyond the scope of this 
paper due to the inefficient convergence performance. Unless stated otherwise, the state space of the Markov chain 
we are concerned with throughout the context is countably infinite, i.e., O = Z”. 


D. Classical MH Algorithms 

The origins of the Metropolis algorithm can be traced back to the celebrated work of l30l in 1950’s. In 12 1II . 
the original Metropolis algorithm was successfully extended to a more general scheme known as the Metropolis- 
Hastings (MH) algorithm. In particular, let us consider a target invariant distribution n together with a proposal 
distribution q(x,y). Given the current state x for Markov chain X t , a state candidate y for the next Markov move 
X t+ i is generated from the proposal distribution q(x, y). Then the acceptance ratio a is computed by 


a(x, y) = min <M 


7r (y)5 , (y, x ) 


( 11 ) 


7r(x)g(x,y) 

and y will be accepted as the new state by X t+ i with probability a. Otherwise, x will be retained by X t+ i. In 
this way, a Markov chain {Xq, Xi, ...} is established with the transition probability P(x, y) as follows: 


, , U(x,y)a(x,y) if y ^ x, 

P(x,y) = ^ (12) 

|i-E z ^x9( x - z W x > z ) ify = x. 

It is interesting that in MH algorithms, the proposal distribution q(x, y) can be any fixed distribution from which 
we can conveniently draw samples. Undoubtedly, the fastest converging proposal density would be q(x, y) = 7r(y), 
but in the context it is assumed that 7r cannot be sampled directly. To this end, many variations of MH algorithms 
with different configurations of q(x, y) were proposed. Nevertheless, how to generate the candidate state y from 
the proposal distribution q tends to be difficult for MH algorithms in high dimensions. 
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III. Independent MHK Algorithm 


In this section, the independent Metropolis-Hastings-Klein (MHK) algorithm for lattice Gaussian sampling is 
firstly presented. Then, we show that the Markov chain induced by the proposed algorithm is uniformly ergodic. 


A. Independent MHK Algorithm 


In the proposed independent MHK algorithm, Klein’s sampling is used to generate the state candidate y for the 
each Markov move Xt+i. As shown in Algorithm 2, it consists of three basic steps: 

1) Sample from the independent proposal distribution through Klein’s algorithm to obtain the candidate state y 
for X t+ i, 


g( x ,y) 


-^Klein(y) — 


e 2a 


fac(By) 

IL=1 P<ri,Vi@d 

|By-c || 2 


EUl SjfreZ 1 

«(y). 


WVi-ViW 


(13) 


where y e Z", y t = c ‘ a, = d = Q'c and B = QR. 

2) Calculate the acceptance ratio a(x,y) 


a(x,y) 


mint 1 


7r(y)g(y,x) \ 

7r ( x )g( x i y) I 


= min 


i, 


7r(y)g( x ) l 
7r( x )g(y) i 


mint 1 


n LlPriMZ)) 


(14) 


where 7r = D\ a c . 

3) Make a decision for X t+ i based on a(x, y) to accept X t+ i = y or not. 

A salient feature of the independent MHK algorithm is that the generation of the state candidate y is independent 
of the previous one, which is completely accomplished by Klein’s algorithm. Therefore, the connection between 
two consecutive Markov states only lies in the decision part, resulting in an independent MCMC sampler. 


Theorem 1. Given the target lattice Gaussian distribution a the Markov chain induced by the independent 
MHK algorithm is ergodic: 


lim ||-P*(x; ■) 


-Da,<t,c(-)I|tv = 0 


for all states x € Z". 

The proof of Theorem 1 is provided in Appendix |A] 


(15) 
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B. Uniform Ergodicity 

Lemma 1. In the independent MHK algorithm for lattice Gaussian sampling, there exists S > 0 such that 


9 


for x € Z”. 

Proof: Using (0 and 0, we have 

g(x) 

7T(X) 


(a) 

> 

where (a) holds due to the fact that 0 

/W(Z)<p CTi (Z)4^ e ~^ 2 . (18) 

As can be seen clearly, the right-hand side (RHS) of (fT71 > is completely independent of x, meaning it can be 
expressed as a constant 5 determined by basis B, query point c and standard deviation a. Therefore, the proof is 
completed. ■ 

We then arrive at a main Theorem to show the uniform ergodicity of the proposed algorithm. 

Theorem 2. Given the invariant lattice Gaussian distribution a the Markov chain established by the inde¬ 
pendent MHK algorithm is uniformly ergodic: 

||P‘(x,-)-£>A,a,c(-)l|TV < (1-5)* (19) 


q(y 


7T(X) 


> S 


(16) 


Pa,c(Bx) 


Pa. c (A) 

nr=i Pi, c( Bx ) 

Pa, c (A) 

nr=i PaiM%) 

Pa,c (-^-) 


Yi'UPaM) 


= 6 


(17) 


for all x G Z". 


Proof: 


Based on (IT3l > and (IT4l) . the transition probability P(x, y) of the independent MHK algorithm are given by 


min 


^(x,y) = 


in[<?(y), 


£(y\Mx) \ 

7T(x) j 


ify^x, 


g( x )+E max {°'g( z )^ y =x - 


( 20 ) 


z^x 


Using (fl~6l> in Lemma 1, it is straightforward to check that the following relationship holds 


P(x, y) > 57r(y) 


( 21 ) 


for all cases of xj £ Z n , which indicates all the Markov transitions have a component of size 5 in common. 
Based on (12Tb . the coupling technique is applied to proceed the following proof [31|. 
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Algorithm 2 Independent Metropolis-Hastings-Klein Algorithm for Lattice Gaussian Sampling 
Input: B.ct, c,X 0 

Output: samples from the target distribution 7r = D,\. a .c. 
l: for t =1,2, ..., do 
2: let x denote the state of X t _i 

3: generate y from the proposal distribution q(x, y) in (IT3l) 

4: calculate the acceptance ratio a(x, y) in (fl4l > 

5: generate a sample u from the uniform density U[ 0,1] 

6: if u < a(x, y) then 

7: let X t = y 

8 : else 

9: X t = X 

10: end if 

11: if Markov chain goes to steady then 

12: output the state of X t 

13: end if 

14: end for 


Specifically, assume two independent Markov chains X t and Y t , and both of them marginally update according 
to the transition probability P(x,y). More precisely, Y t is supposed to start from the stationary distribution n, and 
X t starts from some initial distribution 7r (J , which is not yet steady. 

In principle, any coupling of Markov chains can be modified so that the two chains stay together at all times 
once they meet at a same state |[26l . Moreover, according to coupling inequality (25l . the variation distance || • ||tv 
between distributions formed by Markov chains X t and Y t is upper bounded by the probability that they are not 
coupled. Therefore, we have 

||P 4 (x, ■) - 7 t(-)||tv < P(Xt ^ Y t ). (22) 

As Xj and Y t follow the transition probability 

-P(x,y) > 6- F(y) (23) 

respectively, at each Markov move, X t and Y t have the probability at least S to visit the same state from the 
probability measure 7r(-). Once they meet, they stay together thereafter by getting coupled. Put it another way, the 
probability of X t and Yt getting coupled in every single Markov move respectively is lower bounded by 

= Y<_!_*) > S. (24) 

Therefore, during t Markov moves, the probability of two independent Markov chains X and Y not getting coupled 
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can be derived as 


P(X t ^Y t ) = JJ(1 — P(X.i-l-yi = 

2=1 

= (1 - P(X i _ 1 _ i = Yi-i-rt)* 

< (1 - < 5 )*- 


Then, by substituting (1251) into d22l . we obtain 




(25) 


(26) 


completing the proof. 


Obviously, given the value of S < 1, the mixing time of the Markov chain can be calculated by ([ 6 ]) and ( l26l >. 
that is 

imix(e) = ln(l — 5) < (_lnC) ' G) ’ 6 < 1 (2?) 

where we use the bound ln(l — 6) < —S for 0 < 6 < 1. Therefore, the mixing time is proportional to 1/8, and 
becomes 0 ( 1 ) if 8 —> 1 . 

Here, we point out that the aforementioned spectral gap 7 of the transition matrix can also be used to bound the 
mixing time. However, as shown below, the upper bound offered is not as tight as that of the coupling technique. 


Proposition 1. In the independent MHK algorithm, the spectral gap 7 of the transition matrix is lower hounded 
as 



(28) 


Proof: The proof of Proposition 1 is provided in Appendix B. 

Therefore, substituting d28l > into 0 yields another upper bound of the mixing time 


fmix(e) < — log(7T min e) • 



which is much looser than that in (f27l) . 


e < 1, 


(29) 


C. Convergence Rate in General Cases (a f o) 


In the proposed independent MHK algorithm, by default, the standard deviation of the proposal distribution q is 
set the same as er, namely, a = o. Therefore, a natural question is whether a flexible standard deviation 77 f 0 still 
works. For this reason, in what follows, the relationship between a and a is investigated. 

Without loss of generality, by substitution, the corresponding ratio of q(pc)/ 7r(x) in (fTTb can be rewritten as 


9( X ) ^ Per, c(A) -) || Bx — c || 2 

- > - • P y V 2W 2 2er 2 n ' 11 

*(x) - n?=iP*(z) 


(30) 
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Unfortunately, in the case of a < a, as ||Bx — c|| could go to infinity, it is impossible to determine a constant 
lower bound upon q{x.)/ 7r(x) for x G Z™, implying the uniform ergodicity can not be achieved. Furthermore, the 
Markov chains in this case are not even geometrically ergodic by not satisfying the drift condition shown in (l73ll 
[32], which is clearly explained in Section V. Therefore, a < a should be avoided in practice and the corresponding 
convergence analysis is omitted here. 

On the other hand, in the case of <7 > cr, let d( A, c) denote the Euclidean distance between the lattice A and c, 
that is 

d(A,c) = min ||Bx — c||, (31) 

x£Z n 


then it follows that 


«(*) 


> 


P(T,C (A-) 


_( i _ i W(A,c ) 2 
• £ v 2— 2 2 <j z ’ v ’ ' 


(32) 


*(x) - nr=ife(^) 

for all x G Z ra , which means the underlying Markov chain is uniformly ergodic by satisfying ( IT6l ) in Lemma 1. 
More precisely, q{x)/ 7r(x) could be expressed as 


?(*) 


> 


Per, c ( A) 


~ mu pm 


■p 


where 


p = mu auu . a,c ) 2 

p nr=i p*(z) 


(33) 


(34) 


Clearly, parameter /3 becomes the key to govern the convergence performance. Compared to ( IT71 ). if /3 > 1, the 
convergence of the Markov chain will be boosted by a larger size of 5, otherwise the convergence will be slowed 
down. However, in the case of a > a, it easy to check that the value of 6 is monotonically decreasing with the 
given <7, rendering f3 > 1 inapplicable to the most cases of interest. 

As can be seen clearly from Fig. [2] the convergence rate can be enhanced by /3 > 1 only for a small enough 
a (e.g., <7 2 < 0.398, e.g., —4 dB), thus making the choice of a = a (i.e., f3 = 1) reasonable to maintain the 
convergence performance. This essentially explains the reason why the independent MHK algorithm is proposed 
with <7 = a as a default configuration in general. 


IV. Convergence Analysis 

In this section, convergence analysis about the exponential decay coefficient 6 in the independent MHK algorithm 
is performed, which leads to a quantitative estimate of the mixing time. For a better understanding, the analysis is 
carried out in two cases respectively, namely, c = 0 and c ^ 0. 


A. Convergence Rate (c = 0) 

Lemma 1 shows that the ratio g(x)/7r(x) in the independent MHK sampling algorithm is lower bounded by 
a constant 5. We further derive an explicit expression of the coefficient <5 due to its significant impact on the 
convergence rate, for the case c = 0. 
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Fig. 2. Coefficient /3 of Eg lattice in the case of a > a when c = 0. 


Specifically, we have 


g(x) 

7T(X) 


Pc r,o(A) 


nr=iPa 4 ,x 4 (z) 


§ ExGZ" e 2 " 


II Bx II 


nr=iP^(z) 

(c) @a( 2 ^t) 


nr=i©z(^) 

(d) ©A(i) 


n;u^( 4 ) 


= 5. 


(35) 


Here, for notational simplicity, s = y/2na and s 4 = V^ncn = s/||bj|| are applied in the equations. In (6), the 
inequality p CTii j(Z) < (Z) shown in (fl 8 l) is used again. Theta series 0 a and Jacobi theta function $3 are applied 

in (c) and (d) respectively, where 

0a(t) = 5>-^H a " 2 , (36) 

AeA 


+00 




(37) 


with 0z = 


Proposition 2. If s > y/w(logn) • maxi<j< n ||b* || or s < y/uAjogn) ■ mini<j< n ||bj||, then the coefficient 
S~ 1. 


Proof: To start with, let us recall the flatness factor a, which is defined 

det(B) 1 

( } (a/WP° a( 27R7 2) 


(38) 
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and 


e A (a)=e, if a = r] e (A). 


(39) 


Here, r) s (A) is known as the smoothing parameter in lattices, and if r/ e (A) < y^logn) ■ maxi<j<„ ||b, ||, then e 
will become negligible (Lemma 3.3, El). 

Therefore, the exponential decay coefficient 5 given in (l35l > can be expressed as 


<5 = 


©a(^) 




det(B) -1 • (y/2tra) n ■ [ea(ct) + 1] 
n"=i ^ • [ei(cri) + 1] 
ea (a) + 1 


nr =1 M^) + i]’ 


(40) 


where det(-) denotes the determinant of a matrix. 

From (l38l) . it is easy to verify that the flatness factor ea is a monotonically decreasing function of a, i.e., for 
err > cr 2 , we have ea(cti) < ea^)- Therefore, from (f39l) . let a > rj e (A), then the flatness factor ea(ct) will be 
upper bounded as 

ea(ct) < e. (41) 


Furthermore, assume a > ^/w(log?r) -maxi<j< n ||b, ||, then ea(ct) will approach 0 since its upper bound e becomes 


negligible. Meanwhile, it is easy to check that the same thing also happens to ez {af) for a > ^/w(logn) 
maxi<i< n llbjll. Hence, we have 


6 = 


ea(ct) + 1 


ir=iM^)+i] 

if cr > sj w(log n) ■ maxi<j< n ||bj||. 

On the other hand, according to Jacobi’s formula 


1 


0 A (r) = Idet(B)!- 1 0 A *^ 

the expression of the flatness factor shown in ( l38l ) can be rewritten as 

e A (cr) = 0a*(2 ttct 2 ) - 1, 


where A* is the dual lattice of A. Then, we have 


<5 = 


QA ( 27r 1 (T 2 ) 
£a *( 2 Ff) + 1 


n 


2=1 L 


(557)+1]’ 


(42) 


(43) 


(44) 


(45) 


where $3 = 0%* = Oz- 
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With respect to ca* ( ) an d i n 63- similarly, it follows that if 


-— > ^w(logn) • max ||b*||, 

Z7TCT l<i<n 

then both 6 a* (t^—) an d ez*(j^-) will become negligible, and we have 


1 , 


where b*’s are the Gram-Schmidt vectors of the dual lattice basis B*. 
According to (l46t , it follows that 


er < sJuj{\ogn) 1 ■ (max ||b*||) 

\1 <i<n J 


(e 


-1 


= a/ w(logn) 


-1 


max (||b„-i+i|| ) 

l<«<n 


= ^(log”) 


min ||bi| 

1 <i<n 


-1 


A/w(logn) • mm llbiH, 

l<^<n 


where (e) comes from the fact that 


IbJUHIbn-i+rir 1 . 


(46) 


(47) 


(48) 


(49) 


Therefore, the proof is completed. ■ 

Obviously, according to Proposition 2, as s goes to 0 and oo on both sides, the coefficient 5 will converge to 1. 
We remark that this is consistent with the fact that the Klein’s algorithm is capable of sampling from the lattice 
Gaussian distribution directly with a > ^/^(log n) • maxi <i< n ||bj||. 

Proposition 3. If s < mini<j< n ||bj||, then the coefficient 5 is lower bounded by 

5 > 1.086" n - 0 a (4). (50) 

s z 

Meanwhile, if s > maxi<j< n ||bj||, then the coefficient S is lower bounded by 

5 > 1.086“ n • 0 A *(s 2 ). (51) 


Proof: By definition, we have 


03 ( 1 ) 


+oo 

E 


n =—oo 



1.086 




(52) 


where r(-) stands for the Gamma function. As the Jacobi theta function $3(7") is monotonically decreasing with 


*It is worth pointing out that the explict values of $ 3 ( 2 ), # 3 ( 3 ), ...can also be calculated, where the same derivation in the following can 
also be carried out. Here we choose $ 3 ( 1 ) as the benchmark due to its simplicity. 
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r, let l/s| > 1, namely, s < ||||, then it follows that 

0 3 ( 4 ) - ^ L086 - (53 ^ 

S i 

Assume 5 < mini<i< n ||bi||, then the following lower bound for 5 can be obtained, 

i = nEf%- 1 086 '“' eA( ^ ,54) 

On the other hand, as Z is a self-dual lattice, i.e., Z = Z*, then if sl> 1, namely, s > ||bi||, it follows that 

#* 3 (s 2 i ) = Msi)<Ml)< 1-086. (55) 

Therefore, let s > maxi |b, ||, according to Jacobi’s formula shown in ( 143b . 5 can be lower bounded as 

Qa(?) 

|det(B)| _1 (s 2 )50 A *(s 2 ) 

©A *(s 2 ) 

n;u«) 

1.086 _n • 0 A » (s 2 ), (56) 



completing the proof. ■ 

Remark: We emphasize that the significance of lattice reduction (e.g., LLL or HKZ) can be seen here, as increasing 
mini<j< n ||b,|| and decreasing max] ||b, || simultaneously will greatly enhance the convergence performance 
due to a better lower bound of S. 

Next, with respect to the range of mini<i<„ ||b;|| < s < maxi<i< n ||b*||, we arrive at the following proposition. 
Proposition 4. 7/'min 1 <j< rl ||bj|| < s < max 1 < i < rl ||b, ||, then the coefficient S is lower bounded by 

(5 > 1.086- (n - m) • 2~ m ■ • ©a ( 4V (57) 

s m \s / 

where I denotes the subset of indexes i with Si > 1 (i.e., s > ||bj||), i £ {1, 2,..., n}, |/| = to. 


Proof: From the definition, we have 


+oo 


M t ) = e 77 

n =—oo 

= l + 2^i 


n> 1 

POO 

< 1 + 2 / e" 

J o 

if] , n 


dx 


(58) 
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TABLE I 

Value of <5 with respect tos = \Z2tto- in the independent MHK algorithm. 


s<[\/27rw(logn)] 1 - min ||bj|| 

l<i<n 

Sail 

s < mm ||bj|| 

l<i<n 

d> l.O86- n -0 A (^r) 

min ||bi||<s< max ||bi|| 

l<t<n ±<i<n 

1.086 _ ( n_m l-2 _m - nie/ J bi|1 -0a (A) 

s > max II b ? -|| 

1 <i<n 

5> 1.086 _n • 0 A * (s 2 ) 

s > v/27ru;(logn) • max b; | 

l<i<n 

fwl 


where (/) holds due to the Gaussian integral e ax2 dx = yj\- 
Hence, for terms with 1/sf < 1, namely, s > ||b,||, we have 


03 



< 1 + | Si | < 2 Si 


Therefore, from (l53l) and < [59b . if follows that 



(59) 


11 



< 1.086 (n - m) • 2 m • 


n,e/ l|bi 


(60) 


completing the proof. ■ 

To summarize, the value of S with respect to the given s = s/Zncr in the independent MHK algorithm is given 
in Table I. 

Now, let us consider some lattices whose theta series are more understood. 


Proposition 5. The coefficient 6 = ^ .for an isodual lattice A has a multiplicative symmetry point at 

s = 1, and asymptotically converges to 1 on both sides when s goes to 0 and oo. 


Proof: 

An isodual lattice is one that is geometrically similar to its dual |34| . Here, we note that the theta series 0 a 
of an isodual lattice A and that of its dual A* are the same, i.e., 0a(t) = ©a* (t), and the volume of an isodual 
lattice |det(B)| naturally equals 1. Therefore, we have 

0 a(^) = s”0a(s 2 ), (61) 

$3 = Sii9 3 (sf), (62) 
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Fig. 3. Coefficient 1/5 of the Eg lattice in the case of c = 0. 


then from (1511 and ( |62| >, the symmetry with respect to s = 1 can be obtained as follows, 

© A ( p -) s u 0a(s 2 ) 

nr=i^) “ nr=i^ 3 («?) 

©a(s 2 ) 

nr=i^3(*i) 

0 A (s 2 ) 

iept ECU *>(«?) 

0a(s 2 ) 

ii:-,^^ 2 )- 


By definition, it is straightforward to verify that 


0a(t?) 

nr=i^ 3 (^) 


—> 1, when 


s —>■ 0. 


(63) 


(64) 


Then because of the symmetry, pp ^ will also asymptotically approach 1 when s —> oo, completing the 

proof. ■ 

Examples of the coefficient 1/5 for the isodual E% and Leech lattice are shown in Fig. 0 and Fig. [4] respectively. 
It is worth pointing out that 1/5 has a maximum at the symmetry point s = 1, namely a 2 = t/z- Actually, 1/5 
is similar to, but not exactly the same as the secrecy gain defined in l34l . In our context, 1/5 roughly estimates 
the number of the Markov moves required to reach the stationary distribution. On the other hand, as for non¬ 
isodual lattices, D 4 lattice is applied to give the illustration in Fig. [3 where the symmetry still holds but centers 
at s = 0.376. Therefore, with the exact value of 5, the explicit estimation of the mixing time for the underlying 
Markov chain can be obtained. 
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Fig. 4. Coefficient 1/5 of the Leech lattice in the case of c = 0. 


B. Convergence Rate (c f ()) 

As for the convergence analysis in the case of c ^ 0, we firstly define the exponential decay coefficient 

c, = = P<t,c(A) 

?r( x ) nr=lP«r t ,x t (Z)’ 

then we have the following proposition. 

Proposition 6. For any c £ R" and c / 0, one /i«,v 

, d 2 (A,c) 

S' > e • (5. (66) 


S' as 
(65) 


Proof: Specifically, we have 


e -^dl Bx -<*( A >c) 


Pa, c(A) — Pa,c modA ( A ) 

= £ 

x£Z n 

= I ^ ^ e -^HBx-d(A,c )|| 2 +e -^||Bx+d(A,c)|| 2 ^ 
x£Z n 

' 2 " '.^ e -^ll B x|| 2 .I.^-^<Bx,d(A,c)> +e -^<Bx,<i(A,c)^ 


= e 2 


x£Z n 


W _d{A^ „-1 ||Bx|| 2 

> e -^r- • 2^ e 

x(EZ n 

d(A,c) 

= e_ 2 - 2 • /v(A), 


(67) 


where (*) follows from the fact that for any positive real a > 0, a + 1/a > 2. 


rf(A,c) 


Thus, the value of S' is reduced by a factor of e 2 ^ from <5. Clearly, if c = 0, then S' = S, implying c/0 
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Fig. 5. Coefficient 1/5 of the D 4 lattice in the case of c = 0. 


is a general case of c = 0 q Hence, according to ( 1671 ). as long as c is not too far from A, S' has a similar lower 
bound with S. 


V. Symmetric Metropolis-Klein Algorithm 

In this section, we propose the symmetrical Metropolis-Klein (SMK) algorithm for lattice Gaussian sampling. 
The underlying Markov chain is proved to be geometrically ergodic, which not only converges exponentially fast, 
but also depends on the selection of the initial state. 


A. Symmetric Metropolis-Klein Algorithm 


The Metropolis algorithm can be viewed as a special case of the MH algorithm by utilizing a symmetric proposal 
distribution q(x., y) (30). In the proposed algorithm, we use Klein’s algorithm to generate the symmetric proposal 
distribution. By doing this, the generation of the state candidate y highly depends on the previous state x, thus 
leading to more effective sampling compared to the independent one. Moreover, since the chain is symmetric, the 
calculation of the acceptance ratio a is also greatly simplified. 

Specifically, as shown in Algorithm 3, its sampling procedure at each Markov move can be summarized by the 
following steps: 

1) Given the current Markov state X t = x, sample from the symmetric proposal distribution through Klein’s 
algorithm to obtain the candidate state y for X t+ i, 


q(x,y)=-- 


P«r,Bx(By) 


e (j) 

= q{ y,x), 


( 68 ) 


miiP^(z) nr=r a™cz) 

where yi = c * , c' = Q' Bx and B = QR. Note that equality (j) holds due to the inherent symmetry 

stated in Lemma 2. 


2 In fact, as Pcr,c( A) is periodic, all c E A will lead to d( A, c) = 0, thus corresponding to the case of c = 0. 
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Algorithm 3 Symmetric Metropolis-Klein Algorithm for Lattice Gaussian Sampling 
Input: B.cr, c,X 0 

Output: samples from the target distribution 7r = -Da.<t,c 
l: for t =1,2, ..., do 
2: let x denote the state of X t _i 

3: generate y by the proposal distribution q(x.,y) in (l68l > 

4: calculate the acceptance ratio a(x,y) in (f69l) 

5: generate a sample u from the uniform density U[ 0,1] 

6: if u < a(x, y) then 

7: let X t = y 

8 : else 

9: X t = X 

10: end if 

11 : if Markov chain goes to steady then 

12: output the state of X t 

13: end if 

14: end for 


2) Calculate the acceptance ratio a(x, y) 

' 7r(y)q(y,x) | _ . J 

. ’ 7r(x)9(x, y) J ““j ’tt(x) 
tin {1, e^(ll Bx-c ll 2 -H By-c ll 2 )} j 


a = min 


(y) 


= min • 


where tt = Da.o-.c- 

3) Make a decision for X t+1 based on a(x, y) to accept X t+1 = y or not. 


Lemma 2. The proposal distribution q shown in A68 1 ) is symmetric, namely. 


g(x,y) = q(y, 


(69) 


(70) 


for all x, y £ Z n . 

The proof of Lemma 2 is provided in Appendix [Cl 

Intuitively, at each Markov move, the state candidate y for X t+ i is sampled based on a Gaussian-like distribution 
centered at the point made up by the previous Markov state x. Given the acceptance ratio shown in d69l l. it is quite 
straightforward to see that if By is closer to the given point c than Bx, then state candidate y must be accepted 
by X t+1 due to a = 1, otherwise it will be accepted with a probability depending on the distance from By to c, 
thus forming a Markov chain []. 

Theorem 3. Given the target lattice Gaussian distribution a c , the Markov chain induced by the proposed 
symmetric Metropolis-Klein algorithm is ergodic: 

3 A query about the SMK algorithm is whether a flexible standard deviation a in the proposal distribution q works, i.e., 7r ^ <j. The answer 
is yes. However, since the explicit convergence rate is tedious to get, we omit the corresponding analysis here. 
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lim||P‘(x;-) 


Pa,<t,c(-)I|tV = 0 


for all states x € Z ra . 

The proof of the Theorem 3 is similar to that of the Theorem 1, which is omitted here. 


(71) 


B. Geometric Ergodicity 

In MCMC, a set C C El is referred to as a small set , if there exist k > 0, 1 > 8 > 0 and a probability measure 
v on El such that 

P k {x,B) > Sv(B), Vx g C (72) 

for all measurable subsets B C El. This is also known as the minorisation condition in literature 1281 . Actually, the 
uniform ergodicity can be achieved as a special case of the minorisation condition with C = El. For a bounded 
small set C, the drift condition of Markov chains is debited as follows (25): 


Definition 4. A Markov chain satisfies the drift condition if there are constants 0 < A < 1 and b < oo, and a 
function V : El —> [l,oo], such that 

[ P(x, dy)V (y) < XV (x) + 61c(x) (73) 

JO. 

for all x € El, where C is a small set, lc( x ) equals to 1 when x € C and 0 otherwise. 


In principle, the drift condition is the standard way to prove geometric ergodicity (36). Here, we recall the 
following theorem to show the relationship between drift condition and geometric ergodicity. 


Theorem 4 ((25)). Consider an irreducible, aperiodic Markov chain with stationary distribution ir, if the drift 
condition shown in m is satisfied for any small set C C El with 5 > 0, then the chain is geometrically ergodic. 


Here, we highlight that the discrete version of the drift condition still holds for discrete state space Markov chains 
E). According to Theorem 4, now we check whether the drift condition is satisfied in the proposed algorithm. 
Then, we arrive at the following theorem. 


Theorem 5. Given the invariant lattice Gaussian distribution D\ a ^ c , the Markov chain established by the sym¬ 
metric Metropolis-Klein algorithm satisfies the drift condition. Therefore, the proposed symmetric Metropolis-Klein 
algorithm is geometrically ergodic. 

Proof: By definition, for any ||Bx — By|| < K, where K > 0 is a constant, the proposal distribution q(x,y) 
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small set C 



Fig. 6. Illustration of cases (a) x ^ C and (b) x £ C in the Markov move induced by SMK. The blue dash curve represents the area of the 
small set while the red solid curve denotes the acceptance region A x . 


can always be lower bounded by a constant e > 0 as follows. 


?(x,y) > 


(fc) 

> 


_ K z 

e 2 ^ 


nr=i pcn&iz) 

_ K 2 

e 2 ^ 

nr=i^(z) = 


(74) 


where (k) holds due to dTH i. 

Since q(x, y) can always be lower bounded and 7r(x) is bounded away from 0 and 1 on any bounded sets, every 
non-empty bounded set C C Z" is a small set: 


^(x,y) = g(x, y) • a(x,y) > 


7r 


(x) 


(75) 


for all x£C. Next, in order to specify the small set C, we define 


C = {x € Z“ : tt(x) > 


(76) 


where d > 1 is a constant. 

At each Markov move, given the acceptance ratio a shown in ( l69l ). the acceptance region A x and the potential 
rejection region /i x for the chain started from x are defined as 


A x = {y £ Z n \n(y) > tt(x)}; 

(77) 

7?x = {y G Z"|7r(y) < 7r(x)}. 

(78) 


Apparently, state candidate y € A x will be accepted by X t+ i without uncertainty while state candidate y £ /' x 
has a certain risk to be rejected. Then, based on A x and R x , the discrete term J] ygZ n P(x, y)V(y) on the LHS 
of (1731) can be expressed as (l80l) . Next, let V(x) = 7r(x) - ^. We will discuss the two cases x ^ C and x £ C, 
respectively. 
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E p (^y) v (y)= E p (^y) v (y)+ E p (^y) v (y) 

yeZ“ yeA x ye.R x 

= E g(x,y)V(y) + E <?(x,y)^V(y)+ E«( x ’ y ) f 1 

yeA x y e-R x ^ y eR x 


^(y) 

7T(X) 


y(x). 


E p (*,y)V(y) 

y£Z n 


V(x) 


E 9 (x,y) 

y£2l x 


EM 

V(x) 


+ E «( x > y ) 

y£-Rx 


7r(y) _ ^(y) 

7r(x) V(x) 


+ E «( x ’ y ) 1 

yeflx 


^(y) 

7T(X) 


E«( x ’ y ) 

y£A x 


'EM 

T(x) 


1 +E^y) + E ®( x ’ y) 

J y£-4x ye-Rx 


7r(y) _ ^(y) 

7 r(x) V(x) 


+ E g(x,y)-E ?( x -y) 

yefix yeflx 


= i- E ?( x >y) 

yeA x 


EMi 

7r'(y)5 


+ E <?( x ’ y ) 

yefix 


n'(y)* 

n'(x)i 


^(y) 

7r'(x) 


e 


(i). In the case of x ^ C, i.e., V"(x) > d, we have the following derivation shown in (|8T1 >. where 7 r'(x) = 
'^rll Bx_c ll _ jt is easy to verify that 


lim l(x) ■ Vlog 7 r'(x) = — oo, (82) 

l|x||—»-oo 

where l(x) = ^j- denotes the unit vector and V represents the gradient. This condition implies that for any 
arbitrarily large 7 > 0, there exists R > 0 such that 


7r'(x + a ■ l(x)) 
7r'(x) 


< e- a 'T 


(83) 


where ||x|| > R, a > 0. In other words, as ||x|| goes to infinity, n' is at least exponentially decaying with a rate 7 
tending to infinity. Therefore, once ||x|| is large enough, e.g., ||x|| — ► 00, even a minimum discrete integer increment, 
namely, AgZ" and || A|| = 1 , can make 7 r'(x + A) become extremely smaller than 7 r'(x). 

Now, let yi = x + A £ R x , with large enough ||x||, the ratio of n'(yi)/n'(x) could be arbitrarily small, that is 


^(yi) 

7r'(x) 


—> 0 for ||x|| —>■ 00 . 


(84) 


As yi is the closest candidate to x in set R x by the minimum integer increment A, then the following relationship 
holds due to exponential decay of 7 t' 


7 r'(y 2 ) < Tr'(yi) for ||x|| -> 00, 


(85) 


where y 2 £ R x , y-i / yi denotes another candidate in set R x . Consequently, we have 

7r '(y2) _ Tr'(yi) 


< 


-> 0 for x 


7r'(x) 7t'(x) 

which means the third term of RHS of (| 8 H can be made arbitrarily small by large enough ||x|| 


( 86 ) 
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Similarly, the same derivation can be made for the case of y G A x . Since x is also contained in A x , care must 
be taken to perform the analysis and we have 


7r'(x) 7 t'(x) 

7T'(y4) < 7r'(y 3 ) 


—¥ 0 for ||x|| —»• oo, 


(87) 


where y 3 = x — A G A x stands for the closest non-x candidate in A.,, to x and y 4 represents any other candidate 
in A x except y 3 and x. 

In the case of x ^ C, since the indicator function lc( x ) is 0, A can be expressed directly as the ratio between 
]T]P(x,y)TA(y) and V(x). Therefore, based on (186} and (187b . as ||x|| goes to infinity, we have 

yeZ" 


A = lim sup 


x ->oo 


£P(x,y)V(y) 

yez n 


= 1 - lim inf V <?(x,y) 

v _ Sm-v-v ‘ 


ye^4x 


1 - 


7r'(x) 2 
7r'(y)5 


= 1— lim inf o(x, y) 


yeA x ,y^x 


(0 

< 1, 


( 88 ) 


where inequality (/) holds due to the following derivation 


lim inf ^ q(x, y) 


||x||—foo 


y£^x,y/x 


lim inf > 

xll—>00 2 -—^ 

11 y£A x ,y#x 


e -^rl|Bx-Byf 

rii=i p<ri,viffi) 


> 0. (89) 

Therefore, according to d88t . the drift condition shown in (f73T> is satisfied for x (f_ C. 

(ii). As for the case of x G C, function V for y G /l x is upper bounded as 

V(y) < V(x) < d (90) 

due to A x C C. Therefore, the first term in the RHS of (l80b always can be upper bounded by a constant b > 0, 
namely, 

^2<l(.x-,y)V(y) < b < oo. (91) 

y£A x 

Then as ||x|| goes to infinity, based on (l86l and ([87}, it follows that 

limsupV -P(x, y)V(y) < b+ lim supV g(x, y)V(x) 

* H-oo x -too 

yez n y e-Rx 

= 6 + AF(x), (92) 
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and it is easy to verify that 

A = lim sup V g(x,y) < 1, (93) 

x 

y e-Rx 

completing the proof. ■ 

In essence, the convergence of the geometric ergodicity can be classified into two stages. On one hand, for x </ C, 
due to A < 1, the drift condition guarantees the Markov chain shrinks geometrically towards the small set C. In 
fact, as shown in Fig. [6] given x </ C, by symmetry, the probability of y locating in f? x is always larger than that 
in A x . Put it another way, the size of YlyeA y / x ( /( x - y) wou ld be always smaller than 1/2, which means the 
coefficient A < 1 shown in ( |88| > is always lower bounded by 


A 



(94) 


Apparently, A increases gradually with the decrease of ||x|| and vice versa. 

On the other hand, if x S C, the minorisation condition shown in ( 1721 ) implies the Markov chain will converge 
to the stationary distribution exponentially fast. Specifically, with respect to C, the exponential convergence can be 
demonstrated via coupling technique and <5 is just the exponential decay coefficient. However, as S is determined by 
C, rendering it flexible to the different settings of C. In ITT) . Rosenthal pointed out that for C = {x : V(x) < d} 
and d > 26/(1 — A), Markov chains satisfying the drift condition will converge exponentially to the stationary 
distribution as 

r n (x or) - 4-)IItv'<(1 -sr+ (^fJ^ + Y^+Hxoj), (95) 


where 0 < r < 1, 


1 + d 

l + 2b + Xd 


and U = 1 + 2 (d + b ). 


( 96 ) 


Clearly, there is a trade-off between these two convergence stages: a larger set C indicates a smaller <5 in the 
minorisation condition for x € C but a faster shrink speed A towards C for x </ C (close to 1/2 when ||x|| — > oo). 
However, the size of C, measured by d here, is determined artificially, making both S and A not only flexible but also 
sensitive to the slight change of d. Moreover, even with a specific C, A is still difficult to get. Therefore, although 
geometric ergodicity can be achieved by the proposed SMK algorithm, it is difficult to obtain the quantitative bounds 
on 5 and A. Even though simulation techniques can be applied to make 5 and A accessible, the complexity is too 
computationally expensive to afford in practice ||38l . 

According to ( |95 | i. it is worthy to point out that the convergence of the Markov chain arising from the SMK 
algorithm also highly depends on the starting state xo, which follows the definition of geometric ergodicity given 
in m . In theory, xo could be any candidate from the state space but a poor choice may intensively increase the 
required mixing time. To this end, it is widely suggested to start the Markov chain with xo as close to the center 
of the distribution as possible. This is actually in accordance with the result shown in ( l95l ). implying the solution 
of the CVP between Bx to c is the optimal choice for lattice Gaussian sampling. For the consideration of the 
complexity, Babai’s nearest plane algorithm is recommended here as a preprocessing stage to output the Babai’s 
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point as x 0 (39) . 


VI. Conclusions 

In this paper, two MH-based algorithms were proposed to sample from lattice Gaussian distributions. As the 
proposal distribution in the MH algorithms can be set freely, an independent proposal distribution and a symmetric 
proposal distribution are exploited respectively for the geometric convergence performance. In addition, because the 
state space of the Markov chain in lattices is countably infinite, convergence analysis starting from the ergodicity 
turns out to be necessary. It is proven that the Markov chain arising from the independent MHK algorithm 
is uniformly ergodic, thus leading to an exponential convergence performance regardless of the starting state. 
Furthermore, we show its convergence rate can be explicitly calculated via theta series, implying a tractable 
mixing time to the stationary distribution. On the other hand, the proposed SMK algorithm is demonstrated to 
be geometrically ergodic, where the selection of the starting state matters. Due to the inherent symmetry, it not 
only converges exponentially fast, but also is greatly simplified for the implementation. 
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Appendix A 
Proof of Theorem 1 

Proof: It is easy to verify that the Markov chain with the independent proposal distribution q shown in (fl3l > is 
irreducible and aperiodic. Therefore, according to the definition of ergodicity, the key in proof of ergodicity turns 
out to show the underlying Markov chain is positive recurrent (26). 

Definition 5. An irreducible Markov chain with transition matrix P is positive recurrent if and only if there exists 
a probability distribution 7r on Cl such that 7r = 7rP. 

In principle, the Markov chain produced by the independent MHK algorithm is inherently reversible with respect 
to 7T, since 


7r(x)P(x,y) 


7t(x)g(x,y)a(x,y) 

min{7r(x)g(y),7r(y) 9 (x)} 

7r(y)-P(y,x)> 


(97) 


where the assumption y ^ x is sufficient because the above equation holds trivially in the case of y = x. 
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Next, based on the reversibility, by fixing a state y and summing over all states, we have 

5>(x)P(x,y) = 5>(y)P(y,x) 


x£Z' 

7 r 


(y)EP(y> X ) 

x£Z n 

7r(y)i 


(98) 


where 7 r= 7 rP therefore can be easily obtained. Therefore, complete the proof. 


Appendix B 

Proof of Proposition 1 

Proof: To start with, let us recall the definition of conductance (also known as bottleneck ratio ) in Markov 
chains 


Definition 6. The conductance <1» of a Markov chain is defined as 

*(S) = min Q{S ^ C) , (99) 

SCfi,7r(S)<l/2 7 t(6) 

where subset S c stands for the complement set of S (i.e., S[JS C = Cl, S'Q 5° = 0), and the edge measure Q is 
defined by 

Q{x, y) = n(x)P(x, y) (100) 


and 


Q{S, S c ) = 52Q(z,y). 

xeS,yeS c 


( 101 ) 


It is this value 0 < $ < 1 that has been used to bound the spectral gap 7 of Markov chains. More precisely, in 
the independent MHK algorithms, we have 

Ex€S,y€S« 7r ( x ) P ( x >y) 


$ = min 

SCfi,7r(S)<l/2 

(m) 

> min 

SCQ,7t(S)<1/2 


mm 

SCfi,7r(S)<l/2 


7r(S) 

Exes, ygS■= 4*) • fa (y) 

7T (S) 

6 ■ Exes *r( x ) - Eyes- ^(y) 


t(S) 


min S • 7 r(S c ) 

SCf2,7r(S)<l/2 


“ 2 ’ 


( 102 ) 


where inequality (to) holds due to fact from (| 2 H . 
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Next, by invoking the cheeger inequality m of Markov chains shown below 



$ 2 

— <7<2$, 

(103) 

we have 

> 52 

7 > — , 

~ 8 

(104) 

completing the proof. 

Appendix C 

■ 


Proof of Lemma 2 


Proof: According to the QR-decomposition B = QR, we have 


g(x,y) = 

e -^l|Bx-By|| 2 e -^||R X -Ry|| 2 

(105) 

rii=i Pa. ,vi (^) n i= i p°i ,vi (^) 

by removing the orthogonal matrix Q, where yi = —- J " 1+1 l ' 3 3 , c' = Rx. 


Specifically, the term p ai y H (1) in the denominator of (|105| can be expressed as 



1 C ’i E j = i + 1 r i,i y i \2 

r M ) 


= 

E e ‘ 

Vi 


= 

, , (*4-1/4+ E ^f(*i-w)) 2 

E e * 

ViSZ 


= 

E e * 

*i€Z 


= 

P&i, </>(^)> 

(106) 

n 

where Zi = y t - x t and </> = J2 j^(xj - 

j=i +1 

Clearly, we can easily get that 

- Vi)- 


Pc7 U Xi (%) 

. . - i A r (l/4-*4+ E 'ffriVi-Xj)) 2 

= E e i i=i+1 

XiGZ 



\ ’ -■^■( 2 *- < f’) 2 

= E e * 

ZiGZ 



= PaiA Z ) = P<R,Si( Z )> 

(107) 

where x-i = ——^~' 3 = 1+1 rt ' jX:> ; c" = Ry. 
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